-
Notifications
You must be signed in to change notification settings - Fork 256
[YUNIKORN-3118] Parallelize TryNode evaluations #1043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1043 +/- ##
==========================================
- Coverage 81.46% 81.17% -0.29%
==========================================
Files 101 101
Lines 13282 13411 +129
==========================================
+ Hits 10820 10887 +67
- Misses 2204 2254 +50
- Partials 258 270 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pbacsko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first round, my main concern (which might be invalid) is the performance overhead of creating too many goroutines for unschedulable pods.
That's something which I ask you to investigate using the recommended test case. But I'm open for collaboration on this matter.
|
OK, I couldn't resist and tried this locally with There is a significant decrease in throughput, which baffled me a bit, but then I realized what was going on. This test is a bit synthetic, so not really realistic. The first node in the B-tree is always a "hit", so this piece of code slows things down, because it always has to construct at least a single batch (eg. 24), then walk through the nodes even if it's not needed. The first node of the batch is always the one that we need, so we perform unnecessary verification of predicates for the remaining 23 nodes, essentially wasting CPU cycles. I'm thinking about how to makes this more efficient for this case. Even if it's not always the first node which is good for us, eg. it's the 5th node, we're still evaluating too many nodes. So, we have to have some sort of a threshold to make this improvement justifiable. I think the "first node fit" is always a case whose throughput is hard (or even impossible) to match with a parallel approach, simply due to the extra overhead. But I still suggest exploring different ways to make this more performant. |

What is this PR for?
Added a new feature to implement parallelism in tryNode evaluations
By default this runs with a single thread keeping it backward compatible. But if required, we can configure the parallelism to run the node evaluations in parallel and significantly reduce the latency.
We were able to see 2-3X performance gain when running with parallelism turned on.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-3118
How should this be tested?
Screenshots (if appropriate)
Questions: